Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variableYgiven complex inputsX. Despite recent advances in machine learning and physics-based models, it remains challenging to assess whether an initial model is calibrated for allx, and when needed, to reshape the densities ofytoward ‘instance-wise’ calibration. This paper introduces the local amortized diagnostics and reshaping of conditional densities (LADaR) framework and proposes a new computationally efficient algorithm (Cal-PIT) that produces interpretable local diagnostics and provides a mechanism for adjusting conditional density estimates (CDEs).Cal-PITlearns a single interpretable local probability–probability map from calibration data that identifies where and how the initial model is miscalibrated across feature space, which can be used to morph CDEs such that they are well-calibrated. We illustrate the LADaR framework on synthetic examples, including probabilistic forecasting from image sequences, akin to predicting storm wind speed from satellite imagery. Our main science application involves estimating the probability density functions of galaxy distances given photometric data, whereCal-PITachieves better instance-wise calibration than all 11 other literature methods in a benchmark data challenge, demonstrating its utility for next-generation cosmological analyzes99Code available as a Python package here:https://github.com/lee-group-cmu/Cal-PIT..more » « less
-
Free, publicly-accessible full text available June 1, 2026
-
Land-use land-cover (LULC) change is one of the most important anthropogenic threats to biodiversity and ecosystems integrity. As a result, the systematic generation of annual regional, national, and global LULC map products derived from the classification of satellite imagery data have become critical inputs for multiple scientific disciplines. The importance of quantifying pixel-level uncertainty to improve the robustness of downstream analyses has long been acknowledged but this practice is still not widely adopted in the generation of these LULC products. The lack of uncertainty quantification is likely due to the fact that most approaches that have been put forward for this task are too computationally intensive for large-scale analysis (e.g., bootstrapping). In this article, we describe how conformal statistics can be used to quantify pixel-level uncertainty in a way that is not computationally intensive, is statistically rigorous despite relying on few assumptions, and can be used together with any classification algorithm that produces class probabilities. Our simulation results show how the size of the predictive sets created by conformal statistics can be used as an indicator of classification uncertainty at the pixel level. Our analysis based on data from the Brazilian Amazon reveals that both forest and water have high certainty whereas pasture and the “natural (other)” category have substantial uncertainty. This information can guide additional ground-truth data collection and the resulting raster combining the LULC classification with the uncertainty results can be used to communicate in a transparent way to downstream users which classified pixels have high or low uncertainty. Given the importance of systematic LULC maps and uncertainty quantification, we believe that this approach will find wide use in the remote sensing community.more » « less
-
Many astrophysical analyses depend on estimates of redshifts (a proxy for distance) determined from photometric (i.e., imaging) data alone. Inaccurate estimates of photometric redshift uncertainties can result in large systematic errors. However, probability distribution outputs from many photometric redshift methods do not follow the frequentist definition of a Probability Density Function (PDF) for redshift — i.e., the fraction of times the true redshift falls between two limits z1 and z2 should be equal to the integral of the PDF between these limits. Previous works have used the global distribution of Probability Integral Transform (PIT) values to re-calibrate PDFs, but offsetting inaccuracies in different regions of feature space can conspire to limit the efficacy of the method. We leverage a recently developed regression technique that characterizes the local PIT distribution at any location in feature space to perform a local re-calibration of photometric redshift PDFs resulting in calibrated predictive distributions. Though we focus on an example from astrophysics, our method can produce predictive distributions which are calibrated at all locations in feature space for any use case.more » « less
An official website of the United States government

Full Text Available